European Heart Journal - Digital Health — Latest Matching Preprints

1

Developing a phenotype risk score for TTR V142I to capture undiagnosed variant transthyretin amyloidosis in health systems

Sarkar, D.; Ferar, K. D.; Syed, M. G.; Bastarache, L.; Kenny, E. E.; Abul-Husn, N. S.; Pejaver, V.; Kontorovich, A. R.

2026-01-06 health systems and quality improvement 10.64898/2026.01.05.26343489 medRxiv

Top 0.1%

42.4%

Show abstract

BackgroundPhenotype Risk Scores (PheRS) leverage electronic health record (EHR) data to identify individuals at risk for Mendelian disorders, but their performance remains untested for diseases with common and/or non-specific features such as variant transthyretin amyloidosis (ATTRv), often presenting with heart failure (HF), atrial fibrillation, polyneuropathy, and other prevalent diagnoses. We optimized a PheRS for the most common form of ATTRv by integrating genomic and clinical data in Mount Sinais BioMe biobank, focusing on expert-driven phenotype definitions for the TTR variant p.Val142Ile (V142I), which is prevalent in African American (AA) populations (4%). MethodsWe developed and evaluated a customized PheRS for ATTRv that incorporated 21 expert-curated phenotypic features including 292 ICD-9 and ICD-10 diagnosis codes on a biobank cohort of V142I+ cases (n=383) and controls without any pathogenic/likely pathogenic TTR variants (n=30,642). We compared its performance with the standard automated PheRS approach using different metrics. To account for age-dependent penetrance and high lifelong risk of HF, we further tested the customized PheRS for V142I in a subset of individuals of age [≥] 60 with self-reported Black or AA race/ethnicity and at least one occurrence of HF in their EHRs. ResultsThe expert-curated PheRS outperformed the standard PheRS as measured by improved precision-at-k (0.05 vs. 0.00; k=100), a demonstrably, clinically relevant metric. In the subcohort enriched for anticipated penetrance (older, Black/AA HF patients), the expert-curated PheRS identified more V142I+ individuals (6.0%) among the top 100-scoring individuals than a strategy that randomly sampled from the population (3.6%). ConclusionThis work demonstrates that standard PheRS methods are insufficient for common, adult-onset cardiovascular genetic diseases such as V142I-related ATTRv, but when redesigned with disease biology, ancestry, age, and clinical context in mind, PheRS become clinically actionable tools for precision cardiology.

2

An Ensemble Deep Learning Algorithm for Structural Heart Disease Screening Using Electrocardiographic Images: PRESENT SHD

Dhingra, L. S.; Aminorroaya, A.; Sangha, V.; Pedroso Camargos, A.; Vasisht Shankar, S.; Coppi, A.; Foppa, M.; Brant, L. C. C.; Barreto, S. M.; Ribeiro, A. L. P.; Krumholz, H.; Oikonomou, E. K.; Khera, R.

2024-10-07 cardiovascular medicine 10.1101/2024.10.06.24314939 medRxiv

Top 0.1%

40.6%

Show abstract

BackgroundIdentifying structural heart diseases (SHDs) early can change the course of the disease, but their diagnosis requires cardiac imaging, which is limited in accessibility. ObjectiveTo leverage images of 12-lead ECGs for automated detection and prediction of multiple SHDs using an ensemble deep learning approach. MethodsWe developed a series of convolutional neural network models for detecting a range of individual SHDs from images of ECGs with SHDs defined by transthoracic echocardiograms (TTEs) performed within 30 days of the ECG at the Yale New Haven Hospital (YNHH). SHDs were defined as LV ejection fraction <40%, moderate-to-severe left-sided valvular disease (aortic/mitral stenosis or regurgitation), or severe left ventricular hypertrophy (IVSd > 1.5cm and diastolic dysfunction). We developed an ensemble XGBoost model, PRESENT-SHD, as a composite screen across all SHDs. We validated PRESENT-SHD at 4 US hospitals and the prospective, population-based Brazilian Longitudinal Study of Adult Health (ELSA-Brasil), with concurrent protocolized ECGs and TTEs. We also used PRESENT-SHD for risk stratification of new-onset SHD or heart failure (HF) in clinical cohorts and the population-based UK Biobank (UKB). ResultsThe models were developed using 261,228 ECGs from 93,693 YNHH patients and evaluated on a single ECG from 11,023 individuals at YNHH (19% with SHD), 44,591 across external hospitals (20-27% with SHD), and 3,014 in the ELSA-Brasil (3% with SHD). In the held-out test set, PRESENT-SHD demonstrated an AUROC of 0.886 (0.877-894), 90% sensitivity, and 66% specificity. At hospital-based sites, PRESENT-SHD had AUROCs ranging from 0.854-0.900, with sensitivities and specificities of 93-96% and 51-56%, respectively. The model generalized well to ELSA-Brasil (AUROC, 0.853 [0.811-0.897], 88% sensitivity, 62% specificity). PRESENT-SHD demonstrated consistent performance across demographic subgroups, novel ECG formats, and smartphone photographs of ECGs from monitors and printouts. A positive PRESENT-SHD screen portended a 2- to 4-fold higher risk of new-onset SHD/HF, independent of demographics, comorbidities, and the competing risk of death across clinical sites and UKB, with high predictive discrimination. ConclusionWe developed and validated PRESENT-SHD, an AI-ECG tool identifying a range of SHD using images of 12-lead ECGs, representing a robust, scalable, and accessible modality for automated SHD screening and risk stratification. CONDENSED ABSTRACTScreening for structural heart disorders (SHDs) requires cardiac imaging, which has limited accessibility. To leverage 12-lead ECG images for automated detection and prediction of multiple SHDs, we developed PRESENT-SHD, an ensemble deep learning model. PRESENT-SHD demonstrated excellent performance in detecting SHDs across 5 US hospitals and a population-based cohort in Brazil. The model successfully predicted the risk of new-onset SHD or heart failure in both US clinical cohorts and the community-based UK Biobank. By using ubiquitous ECG images and smartphone photographs to predict a composite outcome of multiple SHDs, PRESENT-SHD establishes a scalable paradigm for cardiovascular screening and risk stratification.

3

Reliability of Artificial Intelligence-enhanced Electrocardiography

Dhingra, L. S.; Croon, P. M.; Batinica, B.; Aminorroaya, A.; Pedroso, A. F.; Oikonomou, E. K.; Khera, R.

2025-11-06 cardiovascular medicine 10.1101/2025.11.04.25339526 medRxiv

Top 0.1%

32.9%

Show abstract

BackgroundThe scientific literature on artificial intelligence-enabled electrocardiography (AI-ECG) has defined a robust performance of AI models in detecting and predicting several structural heart disorders (SHDs) using ECGs. However, as a diagnostic test, the real-world clinical utility of AI-ECG reliability requires the consistency of its results when repeated under similar conditions. AimTo evaluate the reliability of AI-ECG models for different ECGs for the same person, across different diagnostic labels, and using varied modeling approaches. MethodsWe used ECG images (2000-2024) from 5 hospitals and an outpatient network within a large, integrated US health system. For each individual, we identified multiple ECGs recorded within a 30-day period. We evaluated 7 models: 6 convolutional neural networks (CNNs) trained to detect individual SHDs, including LV systolic dysfunction, left valve diseases and severe LVH; an ensemble XGBoost integrating individual CNNs as a composite screen for multiple SHDs. We used concordance correlation coefficient (CCC), Spearman correlation, Cohens kappa, and percent agreement in binary screen status to test model reliability. We evaluated factors associated with different AI-ECG outputs ({Delta} probability> 0.5) and assessed stability across ECG layouts (digital, printed, photo). ResultsAcross sites, we identified 1,118,263 ECG pairs, with a median 1 (1-3) days between ECGs. The ensemble XGBoost had the higher test-retest correlation (CCC: 0.89-0.92) and agreement (kappa: 0.75-0.82) between pairs compared with CNNs (CCC: 0.78-0.88; kappa: 0.57-0.72). After adjusting for demographics, ECG pairs that included one or both inpatient ECG were significantly more likely to yield unstable predictions (ORs: 1.60 [1.50-1.70] and 1.91 [1.78-2.05], respectively) compared with pairs with both ECGs obtained in outpatient settings. Among outpatient pairs across sites, the XGBoost model had a CCC of 0.89-0.94, a Spearman correlation of 0.90-0.94, and a kappa of 0.78-0.84, with concordance rates of 89-92%. Notably, ensemble model predictions were also stable across different ECG layouts. ConclusionAn ensemble AI-ECG model integrating multiple CNN predictions had higher reliability compared with models for individual disorders. Discordance was more common in inpatient ECGs, suggesting instability in high-acuity settings. Reliable ensemble AI-ECG model outputs support readiness for clinical implementation for SHD screening. GRAPHICAL ABSTRACTO_ST_ABSStudy DesignC_ST_ABSAbbreviations: AR, aortic regurgitation; AS, aortic stenosis; CNN, convolutional neural network; ECG, electrocardiogram; FC, fully-connected layers; LVSD, left ventricular systolic dysfunction; MR, mitral regurgitation; SHD, structural heart diseases; sLVH, severe left ventricular hypertrophy, XGBoost, extreme gradient boosting. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=140 SRC="FIGDIR/small/25339526v1_ufig1.gif" ALT="Figure 1"> View larger version (45K): org.highwire.dtl.DTLVardef@7aaba3org.highwire.dtl.DTLVardef@19a8bd1org.highwire.dtl.DTLVardef@151516borg.highwire.dtl.DTLVardef@1b8738f_HPS_FORMAT_FIGEXP M_FIG C_FIG

4

An Artificial Intelligence Model for Detection of Heart Failure with Preserved Ejection Fraction: A Report from HeartShare Study

Karabayir, I.; Singh, S.; Hayit, T.; Soliman, E. Z.; Kitzman, D.; Herrington, D. M.; Borlaug, B. A.; Davis, R. L.; Shah, S. J.; Akbilgic, O.

2025-11-30 cardiovascular medicine 10.1101/2025.11.26.25341127 medRxiv

Top 0.1%

32.8%

Show abstract

BackgroundHeart failure with preserved ejection fraction (HFpEF) accounts for over half of all heart failure cases in the United States and remains a diagnostic challenge. Non-invasive, scalable screening tools may enable earlier recognition, timely intervention, and improved care. To evaluate the performance, reproducibility, and early detection capability of an electrocardiogram-based artificial intelligence (ECG-AI) model designed to identify HFpEF using HeartShare data and real-world ECGs from Wake Forest Baptist Health (WFBH). MethodsThe original ECG-AI model was developed and validated using >1 million ECGs. In this study, we examined the external validity and reproducibility over time of this ECG-AI measure in 432 participants from an NIH-funded study of clinically validated HFpEF or controls (HeartShare). Specifically, we assessed model accuracy (AUC, sensitivity, specificity, predictive values) and reproducibility across three serial ECGs. We also analyzed the potential for early (preclinical) detection of HFpEF in 59,705 real-world ECGs from 12,338 patients a large integrated healthcare system (Wake Forest Baptist Health (WFBH)). ResultsIn HeartShare, ECG-AI achieved an AUC of 0.760 (95% CI: 0.729-0.816), with 65% sensitivity and 75% specificity for detection of HFpEF. We obtained no significantly different AUC when using only lead I ECG as an input, AUC of 0.773 (0.729-0.816). Within-patient reproducibility across three consecutive ECGs showed strong correlations (Pearson r = 0.87-0.89) and strong agreement (Cohens {kappa} = 0.68-0.74). Misclassified cases showed fewer risk factors and more normal-like ECG features. In real-world WFBH data, ECG-AI detected HFpEF up to 4 years before clinical diagnosis with AUCs from 0.77 to 0.80. Conclusions12 lead ECG-AI model demonstrates strong generalizability, reproducibility and early detection capabilities for HFpEF, supporting its potential as a scalable screening and risk stratification tool. Almost identical single lead AUC demands future investigation for remote monitoring. What is new?This study is the first to demonstrate that an ECG-AI model for HFpEF maintains strong temporal reproducibility across serial ECGs, supporting its stability as a robust non-invasive tool. We validate the model in both a rigorously phenotyped cohort and a large real-world health system and show that single-lead ECG input achieves accuracy comparable to the full 12-lead model. In addition, we show that the model can identify HFpEF years before clinical diagnosis, extending prior work by establishing ECG-AI as a reproducible, generalizable, and potentially preclinical detection tool. What are the clinical implications?The strong temporal reproducibility of the ECG-AI measure indicates that it can provide reliable longitudinal tracking of HFpEF risk, making it suitable for both clinical monitoring and remote assessment. Early detection capabilities - up to four years before diagnosis - create opportunities for proactive evaluation and earlier intervention. The comparable performance of single-lead ECGs also opens the door for scalable deployment through wearable or home-based devices, broadening access to HFpEF screening and enabling continuous risk surveillance outside traditional clinical environments.

5

Artificial intelligence-guided detection of under-recognized cardiomyopathies on point-of-care cardiac ultrasound

Oikonomou, E. K.; Holste, G.; Coppi, A.; McNamara, R. L.; Nadkarni, G.; Baloescu, C.; Krumholz, H.; Wang, Z.; Khera, R.

2024-03-15 cardiovascular medicine 10.1101/2024.03.10.24304044 medRxiv

Top 0.1%

32.1%

Show abstract

BackgroundPoint-of-care ultrasonography (POCUS) enables cardiac imaging at the bedside and in communities but is limited by abbreviated protocols and variation in quality. We developed and tested artificial intelligence (AI) models to automate the detection of underdiagnosed cardiomyopathies from cardiac POCUS. MethodsIn a development set of 290,245 transthoracic echocardiographic videos across the Yale-New Haven Health System (YNHHS), we used augmentation approaches and a customized loss function weighted for view quality to derive a POCUS-adapted, multi-label, video-based convolutional neural network (CNN) that discriminates HCM (hypertrophic cardiomyopathy) and ATTR-CM (transthyretin amyloid cardiomyopathy) from controls without known disease. We evaluated the final model across independent, internal and external, retrospective cohorts of individuals who underwent cardiac POCUS across YNHHS and Mount Sinai Health System (MSHS) emergency departments (EDs) (2011-2024) to prioritize key views and validate the diagnostic and prognostic performance of single-view screening protocols. FindingsWe identified 33,127 patients (median age 61 [IQR: 45-75] years, n=17,276 [52{middle dot}2%] female) at YNHHS and 5,624 (57 [IQR: 39-71] years, n=1,953 [34{middle dot}7%] female) at MSHS with 78,054 and 13,796 eligible cardiac POCUS videos, respectively. An AI-enabled single-view screening approach successfully discriminated HCM (AUROC of 0{middle dot}90 [YNHHS] & 0{middle dot}89 [MSHS]) and ATTR-CM (YNHHS: AUROC of 0{middle dot}92 [YNHHS] & 0{middle dot}99 [MSHS]). In YNHHS, 40 (58{middle dot}0%) HCM and 23 (47{middle dot}9%) ATTR-CM cases had a positive screen at median of 2{middle dot}1 [IQR: 0{middle dot}9-4{middle dot}5] and 1{middle dot}9 [IQR: 1{middle dot}0-3{middle dot}4] years before clinical diagnosis. Moreover, among 24,448 participants without known cardiomyopathy followed over 2{middle dot}2 [IQR: 1{middle dot}1-5{middle dot}8] years, AI-POCUS probabilities in the highest (vs lowest) quintile for HCM and ATTR-CM conferred a 15% (adj.HR 1{middle dot}15 [95%CI: 1{middle dot}02-1{middle dot}29]) and 39% (adj.HR 1{middle dot}39 [95%CI: 1{middle dot}22-1{middle dot}59]) higher age- and sex-adjusted mortality risk, respectively. InterpretationWe developed and validated an AI framework that enables scalable, opportunistic screening of treatable cardiomyopathies wherever POCUS is used. FundingNational Heart, Lung and Blood Institute, Doris Duke Charitable Foundation, BridgeBio Research in Context Evidence before this studyPoint-of-care ultrasonography (POCUS) can support clinical decision-making at the point-of-care as a direct extension of the physical exam. POCUS has benefited from the increasing availability of portable and smartphone-adapted probes and even artificial intelligence (AI) solutions that can assist novices in acquiring basic views. However, the diagnostic and prognostic inference from POCUS acquisitions is often limited by the short acquisition duration, suboptimal scanning conditions, and limited experience in identifying subtle pathology that goes beyond the acute indication for the study. Recent solutions have shown the potential of AI-augmented phenotyping in identifying traditionally under-diagnosed cardiomyopathies on standard transthoracic echocardiograms performed by expert operators with strict protocols. However, these are not optimized for opportunistic screening using videos derived from typically lower-quality POCUS studies. Given the widespread use of POCUS across communities, ambulatory clinics, emergency departments (ED), and inpatient settings, there is an opportunity to leverage this technology for diagnostic and prognostic inference, especially for traditionally under-recognized cardiomyopathies, such as hypertrophic cardiomyopathy (HCM) or transthyretin amyloid cardiomyopathy (ATTR-CM) which may benefit from timely referral for specialized care. Added value of this studyWe present a multi-label, view-agnostic, video-based convolutional neural network adapted for POCUS use, which can reliably discriminate cases of ATTR-CM and HCM versus controls across more than 90,000 unique POCUS videos acquired over a decade across EDs affiliated with two large and diverse health systems. The model benefits from customized training that emphasizes low-quality acquisitions as well as off-axis, non-traditional views, outperforming view-specific algorithms and approaching the performance of standard TTE algorithms using single POCUS videos as the sole input. We further provide evidence that among reported controls, higher probabilities for HCM or ATTR-CM-like phenotypes are associated with worse long-term survival, suggesting possible under-diagnosis with prognostic implications. Finally, among confirmed cases with previously available POCUS imaging, positive AI-POCUS screens were seen at median of 2 years before eventual confirmatory testing, highlighting an untapped potential for timely diagnosis through opportunistic screening. Implications of all available evidenceWe define an AI framework with excellent performance in the automated detection of underdiagnosed yet treatable cardiomyopathies. This framework may enable scalable screening, detecting these disorders years before their clinical recognition, thus improving the diagnostic and prognostic inference of POCUS imaging in clinical practice.

6

Automated Echocardiographic Detection of Congenital Heart Disease Using Artificial Intelligence

Lukyanenko, P.; Ghelani, S. J.; Yang, Y.; Jiang, B.; Miller, T.; Harrild, D. M.; Sasaki, N.; Sperotto, F.; Sganga, D.; Triedman, J.; Powell, A.; Geva, T.; La Cava, W.; Mayourian, J.

2026-01-26 cardiovascular medicine 10.64898/2026.01.24.26344771 medRxiv

Top 0.1%

28.4%

Show abstract

BackgroundDelayed or missed diagnosis of congenital heart disease (CHD) contributes to excess pediatric mortality worldwide. Echocardiography (echo) is central to diagnosing and triaging CHD, yet expert interpretation remains a scarce and maldistributed global resource. Artificial intelligence (AI) offers the potential to democratize diagnostics and extend expert-level interpretation beyond large academic centers, but its application in CHD remains underexplored. MethodsWe developed EchoFocus-CHD, an AI-enabled model for automated detection of 12 critical and 8 non-critical CHD lesions, individually and as composites. The composite critical CHD outcome was the primary endpoint. The model expands on a multi-task, view-agnostic architecture (PanEcho) with a transformer encoder to improve focus on relevant echo views. The model was trained (80%) and tested (20%) on the first echo per patient from Boston Childrens Hospital (BCH), with external validation on US and international studies from patients referred to BCH. ResultsThe internal and external cohorts included 3.4 million videos from 54,727 echos (median age at echo 7.1 [IQR, 0.2-15.0] years; 5.8% critical CHD; 23.6% non-critical CHD) and 167,484 videos from 3,356 echos (median age at echo 2.5 [IQR, 0.3-9.4] years; 29.4% critical CHD; 45.6% non-critical CHD), respectively. EchoFocus-CHD showed excellent internal ability to detect the composite critical CHD outcome (AUROC 0.94, LR+ 7.50, LR- 0.14) and individual critical lesions (AUROC 0.83-1.00), as well as composite non-critical CHD (AUROC 0.90, LR+ 5.00, LR- 0.23) and individual non-critical lesions (AUROC 0.70-0.96). Performance declined during external validation to detect critical CHD (AUROC 0.77), coinciding with greater expert disagreement on external cases ({kappa}=0.72 versus 0.82 for internal cases). Explainability analyses demonstrated that the model prioritized the same clinically relevant views (parasternal long-axis, parasternal short-axis, and subxiphoid long-axis) across internal and external cohorts, while UMAP analysis revealed a domain shift between cohorts. Retraining on all available US patients attenuated domain shift, improving international critical CHD detection (AUROC 0.87) and calibration. ConclusionsEchoFocus-CHD shows promise for automated CHD detection and highlights the need to address domain shift for real-world deployment. By identifying high-risk CHD lesions, this approach could support triage, prioritize expert review, and optimize resource allocation, advancing more equitable global cardiovascular care.

7

Artificial Intelligence-Enabled Electrocardiogram for Elevated Left Ventricular Filling Pressure

Lim, J.; Lee, M. S.; Suh, J. H.; Kang, S.; Lee, H. S.; Jang, J.-H.; Son, J. M.; Kwon, J.-M.; Kim, Y.-J.; Kim, K.-H.; Lee, S.-P.

2025-10-07 cardiovascular medicine 10.1101/2025.10.03.25337299 medRxiv

Top 0.1%

28.2%

Show abstract

BackgroundLeft ventricular filling pressure (LVFP) is associated with heart failure symptoms, a key prognostic marker, and a therapeutic target, but is difficult to measure non-invasively. We aimed to develop and validate a deep learning-based artificial intelligence (AI) model using a standard 12-lead electrocardiogram (ECG) to detect elevated LVFP and assess its prognostic value. MethodsWe trained an AI model to detect increased LVFP. Septal E/e >15 on Doppler echocardiography was used to define increased LVFP and guide AI-ECG model training. The model was built upon a foundation model trained with >1 million multi-ethnic ECGs and fine-tuned through a development cohort of 225737 ECGs and 115982 echocardiogram data from 92775 unique patients from two tertiary hospitals. The model performance was assessed in a separate internal population from the development cohort (n=9278) and an independent external cohort from another tertiary hospital (n=17926). The prognostic significance of the AI-ECG output value was evaluated via survival analyses using the internal and external hospital cohorts, as well as the UK Biobank (n=43347). ResultsThe AI-ECG model detected increased LVFP with an area under the curve of 0{middle dot}868 (95% confidence interval [CI] 0{middle dot}859-0{middle dot}877) and 0{middle dot}850 (95% CI 0{middle dot}841-0{middle dot}858) in the internal and external test cohorts, respectively. The model output was an independent predictor of mortality in all three cohorts (adjusted hazard ratio per 10-point increment: internal 1.31 [95% CI 1{middle dot}23-1{middle dot}38]; external 1{middle dot}32 [95% CI 1{middle dot}28-1{middle dot}35]; UK Biobank 1.16 [95% CI 1{middle dot}07-1{middle dot}26]; all p<0{middle dot}001). Its prognostic capability was comparable or superior to traditional echocardiographic parameters, particularly in patients with comorbidities. ConclusionsThe AI-ECG may enable identification of patients with increased LVFP and provide powerful prognostic information. Further prospective studies are warranted to evaluate its clinical utility. CLINICAL PERSPECTIVEO_ST_ABSWhat Is New?C_ST_ABSO_LIBy using a specific, broadly applicable echocardiographic marker, E/e > 15 as the training target, our model circumvents the well-documented problems of indeterminate classifications and the exclusion of patients with atrial fibrillation, that have constrained previous models. C_LIO_LIThe most significant added value is the extensive external validation. We built our model upon a state-of-the-art, multi-ethnic foundation model pre-trained on >1 million ECGs, and demonstrated the models consistent high performance not only in an internal cohort but also in two independent, racially and geographically distinct external cohorts. This robust external validation directly confronts the critical challenge of generalizability. C_LI What Are the Clinical Implications?O_LIThe AI-ECG output value provides independent and meaningful prognostic information, with performance comparable or numerically superior to established traditional echocardiographic parameters. This was particularly evident in patients with comorbidities, where the role of traditional echocardiographic markers is often limited. C_LIO_LIThe AI-ECG may enable both population-level screening and enhance longitudinal management, offering an opportunity to identify at-risk individuals earlier and implement preventive strategies. C_LI

8

Scalable Risk Stratification for Heart Failure Using Artificial Intelligence applied to 12-lead Electrocardiographic Images: A Multinational Study

Dhingra, L. S.; Aminorroaya, A.; Sangha, V.; Pedroso Camargos, A.; Asselbergs, F. W.; Brant, L. C. C.; Barreto, S. M.; Ribeiro, A. L. P.; Krumholz, H.; Oikonomou, E. K.; Khera, R.

2024-04-03 cardiovascular medicine 10.1101/2024.04.02.24305232 medRxiv

Top 0.1%

28.0%

Show abstract

BackgroundCurrent risk stratification strategies for heart failure (HF) risk require either specific blood-based biomarkers or comprehensive clinical evaluation. In this study, we evaluated the use of artificial intelligence (AI) applied to images of electrocardiograms (ECGs) to predict HF risk. MethodsAcross multinational longitudinal cohorts in the integrated Yale New Haven Health System (YNHHS) and in population-based UK Biobank (UKB) and Brazilian Longitudinal Study of Adult Health (ELSA-Brasil), we identified individuals without HF at baseline. Incident HF was defined based on the first occurrence of an HF hospitalization. We evaluated an AI-ECG model that defines the cross-sectional probability of left ventricular dysfunction from a single image of a 12-lead ECG and its association with incident HF. We accounted for the competing risk of death using the Fine-Gray subdistribution model and evaluated the discrimination using Harrels c-statistic. The pooled cohort equations to prevent HF (PCP-HF) were used as a comparator for estimating incident HF risk. ResultsAmong 231,285 individuals at YNHHS, 4472 had a primary HF hospitalization over 4.5 years (IQR 2.5-6.6) of follow-up. In UKB and ELSA-Brasil, among 42,741 and 13,454 people, 46 and 31 developed HF over a follow-up of 3.1 (2.1-4.5) and 4.2 (3.7-4.5) years, respectively. A positive AI-ECG screen portended a 4-fold higher risk of incident HF among YNHHS patients (age-, sex-adjusted HR [aHR] 3.88 [95% CI, 3.63-4.14]). In UKB and ELSA-Brasil, a positive-screen ECG portended 13- and 24-fold higher hazard of incident HF, respectively (aHR: UKBB, 12.85 [6.87-24.02]; ELSA-Brasil, 23.50 [11.09-49.81]). The association was consistent after accounting for comorbidities and the competing risk of death. Higher model output probabilities were progressively associated with a higher risk for HF. The models discrimination for incident HF was 0.718 in YNHHS, 0.769 in UKB, and 0.810 in ELSA-Brasil. Across cohorts, incorporating model probability with PCP-HF yielded a significant improvement in discrimination over PCP-HF alone. ConclusionsAn AI model applied to images of 12-lead ECGs can identify those at elevated risk of HF across multinational cohorts. As a digital biomarker of HF risk that requires just an ECG image, this AI-ECG approach can enable scalable and efficient screening for HF risk.

9

Deep learning on 3D ECG geometry predicts ischemia

Bermejo Valdes, A. J.

2025-10-30 cardiovascular medicine 10.1101/2025.10.29.25339046 medRxiv

Top 0.1%

26.7%

Show abstract

BackgroundThree-dimensional (3D) electrocardiography (ECG) is a recent methodological advance that extends the dimensionality of the standard ECG, enabling geometric descriptors that capture acute ischemia. Integrating these descriptors with deep learning (DL) may improve the discrimination between ischemic and non-ischemic states and promote the clinical translation of 3D ECG analysis. MethodsECGs from seventeen patients with acute left anterior descending (LAD) artery stenosis (>50 %) were obtained from the PTB Diagnostic ECG Database (PhysioNet). Pre- and post-catheterization recordings were analyzed in 2D and 3D (V3, V6, time) over the QRS end-T onset interval. Geometric descriptors included perimeter, curvature, three almost-curvature variants, and a newly defined torsion metric. Statistical analyses comprised univariate, bivariate, and multivariate tests (PERMANOVA), complemented by DL classification using a residual multilayer perceptron with patient-wise cross-validation, isotonic calibration, and logistic meta-blending, adopting a significance level of = 0.01 (99 % confidence) to ensure inference stability given the limited sample size. ResultsFour descriptors changed significantly after revascularization (P2D,V 6t,{kappa} 2D,V 6t, 3D,2, and{tau} ). Correlation analyses indicated redundancy among curvature-related metrics, whereas torsion provided independent information. PERMANOVA confirmed that torsion alone, and only metric sets including torsion, achieved significance (p < 0.05). The torsion-based DL model provided the best discrimination, with an area under the ROC curve of 0.76 (99 % CI, 0.57-0.94; p < 0.001), specificity 0.82, and a Brier score of 0.18. ConclusionsThe integration of torsion into a DL-based 3D ECG framework enhanced the detection of acute ischemia, increasing diagnostic specificity and improving early triage and clinical decision-making in acute cardiac care.

10

Deep Learning-enabled Detection of Aortic Stenosis from Noisy Single Lead Electrocardiograms

Aminorroaya, A.; Dhingra, L. S.; Sangha, V.; Oikonomou, E. K.; Khunte, A.; Shankar, S. V.; Camargos, A. P.; Haynes, N.; Hofer, I.; Ouyang, D.; Nadkarni, G.; Khera, R.

2023-10-02 cardiovascular medicine 10.1101/2023.09.29.23296310 medRxiv

Top 0.1%

26.6%

Show abstract

BackgroundDue to the lack of a feasible screening strategy, aortic stenosis (AS) is often diagnosed after the development of clinical symptoms, representing advanced stages of disease. Portable and wearable devices capable of recording electrocardiograms (ECGs) can be used for scalable screening for AS, if the diagnosis can be made with a single-lead ECG, despite potentially noisy acquisition. MethodsUsing electronic health records and imaging data from a large, diverse hospital system (2015-2022), we developed a deep learning-based approach to detect moderate/severe AS using a single-lead ECG. We used ECGs paired with echocardiograms obtained within 30 days of each other to develop the model. We extracted lead I signal data from clinical ECG and augmented it with random Gaussian noise. We trained a convolutional neural network (CNN) to identify TTE-confirmed AS using noisy single-lead ECGs. Finally, we used the CNN model probabilities, along with patient age and sex, as predictive inputs to train an extreme gradient boosting (XGBoost) model to detect moderate/severe AS. ResultsThe model was developed in 75,901 ECGs/35,992 patients (median age 61 [interquartile range (IQR) 47-72] years, 54.3% women, 9.5% Black) and validated in 3,733 patients (median age 61 [IQR 47-72] years, 53.4% women, 9.7% Black). In the held-out validation set, the ensemble XGBoost model achieved an AUROC of 0.829 (95% CI: 0.800-0.855), with a sensitivity of 90.4% and specificity of 58.7% for detecting moderate/severe AS. For detecting severe AS, the models AUROC was 0.846 (95% CI, 0.778-0.899), with a sensitivity of 94.3% and specificity of 57.0%. In the test set with a 4.5% prevalence of moderate/severe AS, the model had a PPV of 9.3% and an NPV of 99.2%. In simulated cohorts with 1% and 20% prevalence of moderate/severe AS, the models NPVs varied from 99.8% to 96.1%, and PPV from 2.2% to 35.4%, respectively. ConclusionWe developed a novel portable- and wearable-adapted deep learning approach for the detection of moderate/severe AS from noisy single-lead ECGs. Our approach represents a highly sensitive, feasible, and scalable strategy for community-based AS screening.

11

Beyond Doppler: Scalable AI Detection of LVOT Obstruction in HCM

Crystal, O.; Farina, J. M. M.; Scalia, I. G.; Ayoub, C.; Park, H.-B.; Kim, K. A.; Arsanjani, R.; Lester, S. J.; Banerjee, I.

2026-04-20 cardiovascular medicine 10.64898/2026.04.17.26351151 medRxiv

Top 0.1%

25.9%

Show abstract

BackgroundAccurate assessment of left ventricular outflow tract (LVOT) gradients is critical for hypertrophic cardiomyopathy (HCM) management, yet Doppler-based measurements are technically demanding and require expertise. ObjectiveTo develop a multi-view deep learning model capable of classifying LVOT obstruction (> 20mmHg) using routine 2D echocardiographic windows without reliance on Doppler imaging. MethodsWe trained and externally validated a cross-attention-based video-to-video fusion framework that integrated EchoPrime-derived video representations from three standard transthoracic echocardiographic views to classify LVOT gradients. ResultsTraining was performed on a derivation cohort (N = 1833) from a tertiary care system in the United States, with model performance evaluated on an internal held-out test set (N = 275) and a Korean external validation cohort (N = 46). Single-view baselines showed limited discrimination (external AUROCs 0.47-0.70). Conversely, domain-specific foundational model (EchoPrime) achieved superior single-view performance (AUROCs 0.75-0.80 internal; 0.79-0.83 external), highlighting the importance of echo-specific pretraining and temporal modeling. The proposed multi-view fusion further enhanced predictive performance, with the late fusion model reaching an AUROC of 0.84 on the external cohort with significant population-shift. ConclusionsThese results suggest LVOT physiology is encoded in routine 2D imaging and can be leveraged for clinically relevant gradient classification without Doppler input- proposed AI-guided strategy demonstrates substantial cost savings compared with the screen-all approach. By integrating complementary spatial-temporal information across multiple views, our approach generalizes robustly across populations and may enable real-time decision support, extend LVOT assessment to portable or resource-limited settings, and complement Doppler-based evaluation for longitudinal HCM management.

12

AI-ECG for LVSD detection: a systematic review and first-in-kind multinational head-to-head comparison

Croon, P. M.; Boonstra, M. J.; Allaart, C. P.; Arends, B. K. O.; Dhingra, L. S.; Huang, Y.-C.; Mast, T.; Khera, R.; Kuo, C.-F.; Kwon, J.-M.; Lee, H. S.; Lee, M. S.; van de Leur, R.; Liu, Z.-Y.; Oikonomou, E. K.; Selder, J. L.; Winter, M. M.; Asselbergs, F. W.

2025-07-11 cardiovascular medicine 10.1101/2025.07.08.25331129 medRxiv

Top 0.1%

25.7%

Show abstract

BackgroundSeveral artificial intelligence-enhanced electrocardiogram (AI-ECG) models have shown promise in detecting left ventricular systolic dysfunction (LVSD), but their head-to-head agreement and performance have not been independently compared within the same cohort. ObjectivesTo compare the performance of published AI-ECG models for LVSD detection in a standardized external cohort and evaluate the fields transparency and reproducibility. MethodsWe systematically reviewed AI-ECG models predicting LVSD and assessed the risk of bias. Authors were invited to share models for external validation in a well-phenotyped registry of patients undergoing routine clinical cardiac magnetic resonance imaging (CMR) with cardiologist-adjudicated reports and paired ECGs. Model performance was evaluated in all consecutive patients and a lower-complexity subgroup with 15% LVSD prevalence. ResultsWe identified 35 studies describing 51 models, reporting high (AUROC >0.80) or excellent (AUROC >0.90) performance. The risk of bias is high and primarily attributed to the limited description of development and validation cohort characteristics, as well as the lack of independent external validation. Four groups (from Korea, the United States, Taiwan, and the Netherlands) shared models for independent testing. AUROCs ranged from 0.83 to 0.93 in all patients (n = 1,203; mean age 59 {+/-} 15 years; 450 [35%] female) and from 0.87 to 0.96 in the lower complexity subset. Performance remained consistent across subgroups, with slight decreases in ECGs showing wide QRS complexes or atrial fibrillation. ConclusionsIn this first-in-kind independent validation and head-to-head comparison study, AI-ECG for LVSD detection demonstrated strong performance despite training on disparate populations. However, the limited availability of models hinders independent validation.

13

Contrastive Multi-modal Training with Electrocardiography and Natural Language Echocardiography Reports for Zero-shot Prediction of Structural Heart Disease

WONG, W.-C.; LIU, C.; ELIAS, P.; HUGHES, J. W.; LEUNG, C.-Y.; QIAN, X.-Y.; LI, H.-L.; LAU, Y.-M.; TAO, C.-F.; CHOO, A.; YUNG, C.-H.; FONG, C.-H.; CHOI, W.-K.; CHENG, C.-K.; CHENG, L.-L.; LAU, L.-M.; RELWANI, R.; QIN, J.; YU, L.; LUI, H.-W.; CHIU, H.-O. A. C.; TSE, H.-F.; SIU, C.-W.; ARCUCCI, R.; HO, J. W.-K.; WONG, C.-K.

2025-09-18 cardiovascular medicine 10.1101/2025.09.16.25335870 medRxiv

Top 0.1%

25.7%

Show abstract

BackgroundMachine learning models for predicting structural heart disease (SHD) from electrocardiography (ECG) traditionally required structured echocardiographic data. The potential of echocardiography (ECHO) natural language reports remains underused. We describe MERL-ECHO, a multimodal model using contrastive language-image pre-training (CLIP) that aligns ECG with ECHO natural language reports for zero-shot SHD prediction. MethodsWe conducted a multi-center retrospective study using paired ECG and ECHO natural language reports from Queen Mary Hospital and Tung Wah Hospital in Hong Kong. MERL-ECHO was trained on 45,016 pairs ECG-ECHO pairs. Performance was evaluated on an internal test set covering 10 SHDs and on an external test set of 5,442 ECGs with ECHO-derived labels for 6 SHDs from Columbia University Irving Medical Center, USA. ResultsThe cohort included 8,192 patients (mean age 73.7{+/-}16.5 years; 55.3% male). In the internal test set, MERL-ECHO achieved an average AUROC of 0.69, with strongest performance for left ventricular dilation (0.78), right ventricular systolic dysfunction (0.71), and tricuspid regurgitation (0.71). In the external test set, the average AUROC was 0.72, with highest performance for left ventricular systolic dysfunction (0.76) and aortic stenosis (0.76). Pre-training improved AUROC by up to 5%, performance scaled with larger datasets, and ResNet18 outperformed ViT-Tiny as ECG encoder by 7%. Saliency analysis revealed interpretable ECG features, including unexpected P-wave changes in aortic stenosis, suggesting novel disease markers. ConclusionsMERL-ECHO leverages ECHO natural language reports for multimodal training with ECG. This CLIP-based model enables accurate zero-shot prediction of SHDs and highlights interpretable ECG features with potential clinical relevance.

14

Echocardiography-Based, Artificial Intelligence-Enabled Electrocardiography (AI-ECG) for Diastolic Hemodynamics Phenotyping in Acute Heart Failure (AHF)

Wong, Y. W.; Abbasi, M.; Lee, E.; Tsaban, G.; Attia, Z. I.; Friedman, P. A.; Noseworthy, P. A.; Lopez-Jimenez, F.; Chen, H. H.; Lin, G.; Scott, L. R.; AbouEzzeddine, O. F.; Oh, J. K.

2026-03-06 cardiovascular medicine 10.64898/2026.03.05.26347763 medRxiv

Top 0.1%

25.5%

Show abstract

BackgroundAcute heart failure (AHF) exhibits marked heterogeneity in diastolic hemodynamics, yet comprehensive echocardiographic assessment of diastolic function (DF) and filling pressure (FP) is often infeasible. We evaluated whether artificial intelligence-enabled electrocardiography (AI-ECG) could provide scalable DF grading and FP estimation in hospitalized AHF patients. MethodsWe retrospectively studied adults hospitalized for AHF across Mayo Clinic sites (2013-2023) who received [≥]1 dose of intravenous loop diuretic and had paired 12-lead ECG and TTE. The previously validated AI-ECG DF model was applied without retraining to generate four DF grades and a continuous FP probability. Clinical outcomes were all-cause mortality and heart failure rehospitalization. Associations with clinical severity markers and echocardiographic indices were examined. Kaplan-Meier survival analysis and adjusted multivariable Cox proportional hazards models were performed. Exploratory analyses examine the kinetics of change in FP probability and impact on mortality. ResultsAmong 11,513 patients (median age 75 years, 39% female), AI-ECG DF grading was feasible in 100%, whereas echocardiographic DF was indeterminate in 44% of clinically eligible patients. In 2,582 patients with determinate echocardiographic DF, AI-ECG FP probability discriminated TTE Grade 2-3 dysfunction with AUC 0.85 (95% CI 0.83 - 0.86). Higher AI-ECG DF grades were associated with higher comorbidity burden, worse NYHA class, elevated NT-proBNP, higher MAGGIC scores, elevated PCWP, and more advanced structural remodeling. After multivariable adjustment, AI-ECG DF remained independently associated with mortality (hazard ratio [HR] 1.25, 95% CI 1.16-1.35 for Grade 2; HR 1.44, 95% CI 1.33-1.56 for Grade 3 versus Normal/Grade 1). Combining AI-ECG DF with MAGGIC scores yielded ordered risk gradients, with highest mortality in patients with both high MAGGIC and Grade 2-3 DF. Among patients with serial ECGs, improvement in FP probability was independently associated with lower mortality (HR 0.85, 95% CI 0.79-0.91), whereas worsening did not show a consistent adverse gradient beyond baseline DF. ConclusionsIn a large, geographically diverse AHF cohort, AI-ECG DF grading was universally feasible, correlated with established hemodynamic severity markers, and provided independent prognostic information beyond established risk factors, supporting its role as a pragmatic, scalable diastolic biomarker in AHF. CLINICAL PERSPECTIVEO_ST_ABSWhat Is New?C_ST_ABSO_LIIn 11,513 hospitalized acute heart failure (HF) patients, artificial intelligence-enabled electrocardiography provided diastolic function grading in 100% of patients from a single 12-lead ECG without requiring additional clinical variables, compared with 56% feasibility for guideline-based echocardiography grading. C_LIO_LIAI-ECG diastolic function grades correlated with established marker of severity (NYHA functional class, NT-proBNP, MAGGIC risk scores, and pulmonary capillary wedge pressure) and remained independently associated with both mortality and HF rehospitalization after multivariable adjustment. C_LIO_LISerial AI-ECG measurements identified post-discharge filling pressure trajectories, with improvement independently associated with 15% lower mortality, a first demonstration that longitudinal ECG assessment can track post-discharge hemodynamic recovery. C_LI What Are the Clinical Implications?O_LIAI-ECG transforms the universally obtained 12-lead ECG into an actionable hemodynamic biomarker that addresses the critical gap when echocardiographic diastolic function assessment is indeterminate or unavailable in acute HF patients. C_LIO_LIDespite markedly different hemodynamic severity and long-term outcomes across AI-ECG diastolic function grades, hospitalization length of stay did not differ, suggesting advanced diastolic dysfunction represents occult risk not easily recognized during routine acute care and highlighting the need for improved post-discharge risk stratification. C_LIO_LIThe continuous filling pressure probability metric enables longitudinal monitoring of post-discharge hemodynamic status using serial routine ECGs, potentially identifying patients requiring intensified follow-up or specialist referral. C_LI

15

Foundation models for generalizable electrocardiogram interpretation: comparison of supervised and self-supervised electrocardiogram foundation models

Nolin-Lapalme, A.; Sowa, A.; Delfrate, J.; Tastet, O.; Corbin, D.; Kulbay, M.; Ozdemir, D.; Noel, M.-J.; Marois-Blanchet, F.-C.; Harvey, F.; Sharma, S.; Ansari, M.; Chiu, I.-M.; Dsouza, V.; Friedman, S. F.; Potter, B.; Chasse, M.; Afilalo, J.; Elias, P. A.; Jabbour, G.; Bahani, M.; Dube, M.-P.; Boyle, P. M.; Chatterjee, N. A.; Barrios, J.; Tison, G. H.; Ouyang, D.; Maddah, M.; Khurshid, S.; Cadrin-Tourigny, J.; Tadros, R.; Hussin, J.; Avram, R.

2025-03-05 cardiovascular medicine 10.1101/2025.03.02.25322575 medRxiv

Top 0.1%

23.7%

Show abstract

BackgroundThe 12-lead electrocardiogram (ECG) remains a cornerstone of cardiac diagnostics, yet existing artificial intelligence (AI) solutions for automated interpretation often lack generalizability, remain closed-source, and are primarily trained using supervised learning, limiting their adaptability across diverse clinical settings. To address these challenges, we developed and compared two open-source foundational ECG models: DeepECG-SSL, a self-supervised learning model, and DeepECG-SL, a supervised learning model. MethodsBoth models were trained on over 1 million ECGs using a standardized preprocessing pipeline and automated free-text extraction from ECG reports to predict 77 cardiac conditions. DeepECG-SSL was pretrained using self-supervised contrastive learning and masked lead modeling. The models were evaluated on six multilingual private healthcare systems and four public datasets for ECG interpretation across 77 diagnostic categories. Fairness analyses assessed disparities in performance across age and sex groups, while also investigating fairness and resource utilization. ResultsDeepECG-SSL achieved AUROCs of 0.990 (95%CI 0.990, 0.990) on internal dataset, 0.981 (95%CI 0.981, 0.981) on external public datasets, and 0.983 (95%CI 0.983, 0.983) on external private datasets, while DeepECG-SL demonstrated AUROCs of 0.992 (95%CI 0.992, 0.992), 0.980 (95%CI 0.980, 0.980) and 0.983 (95%CI 0.983, 0.983) respectively. Fairness analyses revealed minimal disparities (true positive rate & false positive rate difference<0.010) across age and sex groups. Digital biomarker prediction (Long QT syndrome (LQTS) classification, 5-year atrial fibrillation prediction and left ventricular ejection fraction (LVEF) classification) with limited labeled data, DeepECG-SSL outperformed DeepECG-SL in predicting 5-year atrial fibrillation risk (N=132,050; AUROC 0.742 vs. 0.720; {Delta}=0.022; P<0.001), identifying reduced LVEF [≤]40% (N=25,252; 0.928 vs. 0.900; {Delta}=0.028; P<0.001), and classifying LQTS syndrome subtypes (N=127; 0.931 vs. 0.853; {Delta}=0.078; P=0.026). ConclusionBy releasing model weights, preprocessing tools, and validation code, we aim to support robust, data-efficient AI diagnostics across diverse clinical environments. This study establishes self-supervised learning as a promising paradigm for ECG analysis, particularly in settings with limited annotated data, enhancing accessibility, generalizability, and fairness in AI-driven cardiac diagnostics. Key QuestionCan self-supervised (SSL) learning yield ECG-based AI foundational models with enhanced performance, fairness, privacy, and generalizability compared to traditional supervised learning (SL) approaches? Key FindingOur evaluation of DeepECG-SL and DeepECG-SSL across seven external health center datasets and four international publicly accessible datasets demonstrated that while both models achieve comparable diagnostic accuracy for ECG interpretation, SSL outperforms SL on novel tasks with smaller datasets. Take-home MessageWe validated DeepECG-SL and DeepECG-SSL across public and private datasets and demonstrated that SSL model had a superior generalizability by addressing fairness, privacy, and efficiency, and open sourcing our models, we advance ethical, adaptable AI for equitable, real-world ECG diagnostics. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=123 SRC="FIGDIR/small/25322575v1_ufig1.gif" ALT="Figure 1"> View larger version (39K): org.highwire.dtl.DTLVardef@bcab9dorg.highwire.dtl.DTLVardef@a6f3acorg.highwire.dtl.DTLVardef@c7c156org.highwire.dtl.DTLVardef@66175a_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOGraphical abstract:C_FLOATNO DeepECG-SL and DeepECG-SSL, two open-source AI models for 12-lead ECG interpretation, were trained on over 1 million ECGs. DeepECG-SSL, utilizing self-supervised contrastive learning and masked lead modeling, outperformed DeepECG-SL in utilizing digital biomarkers to predict atrial fibrillation risk, reduced LVEF, and long QT syndrome subtypes, while both models achieved high diagnostic accuracy with minimal fairness disparities across age and sex. Validated on ten external datasets, our work provides a robust, reproducible framework for equitable, efficient ECG-based cardiac diagnostics. C_FIG

16

Predicting Near-term Mortality in Heart Failure: External Validation of Electronic Health Record-Based Deep Learning Model

McGilvray, M. M. O.; Pawale, A.; Roberts, S.; Shepherd, H. M.; Wilcox, A.; Heaton, J.; Pasque, M. K.

2025-08-15 cardiovascular medicine 10.1101/2025.08.13.25333636 medRxiv

Top 0.1%

23.3%

Show abstract

Structured AbstractO_ST_ABSBackgroundC_ST_ABSThe dire consequences of heart failure (HF) patient non-response to guideline directed medical therapy often fuel early, non-selective referral for surgical intervention (ventricular assist device [VAD] or transplant). The high-risk associated with these interventions mandates precision in directing them only toward those patients who would otherwise suffer severe near-term deterioration. We previously reported a 52,265-patient deep learning model that predicted 1-year severe decompensation/death in HF inpatients, with a C-statistic of 0.91. We now present external model validation. Few groups applying deep learning to large-scale datasets have achieved external validation using equally large-scale independent datasets, yet proof of generalization is essential to practical applicability. MethodsOur previous study used standard electronic health record (EHR) data to build ensemble deep learning models employing time-series and densely connected networks. The positive-class included both all-cause mortality and referral for HF surgical intervention within 1 year. In the current study, we assessed generalization of model architecture in an external validation test set from the Veterans Cardiac Health and Artificial Intelligence Model Predictions (V-CHAMPS) challenge, a synthetic national governmental sample using a distinct EHR system. While V-CHAMPS is a robust dataset, variables that capture VAD/transplant referral were not readily extracted, limiting the positive-class to mortality only. ResultsA total of 380,441 distinct admissions from 75,086 HF patients contributed >720 million EHR datapoints. 23% of observations fit positive-class criteria. The model C-statistic in the external-validation cohort was 0.79. ConclusionsDespite being developed in a single-center dataset with a more precise positive-class, our model architecture maintained relative accuracy when applied to a national sample in an unrelated EHR system. This supports clinical relevancy of the deep-learning model and adaptability with retraining to disparate contexts. This broad applicability suggests considerable potential of EHR-based deep learning models to assist HF clinicians in improving the usage of advanced surgical therapy.

17

Translating Deep Learning to Clinical Practice: External Validation and Clinical Benefit of an Electrocardiogram-Based Neural Network for Detecting Low Ejection Fraction

Hakim, N.; Lee, G.; Vidmar, D. M.; Wagner, L. P.; Trachtenberg, M.; Haggerty, C. M.; Fornwalt, B. K.; Pfeifer, J.

2025-10-08 cardiovascular medicine 10.1101/2025.10.02.25336997 medRxiv

Top 0.1%

23.2%

Show abstract

Low ejection fraction (EF), an indicator of impaired heart function, often goes undiagnosed and can lead to avoidable heart failure and arrhythmias. We developed and externally validated a deep learning model for detecting low EF from 12-lead electrocardiograms. The model achieved 85.8% sensitivity and 83.0% specificity on an independent validation cohort, with consistent results across demographic subgroups. These findings supported FDA 510(k) clearance of the model. Clinical net benefit analysis further showed that the model provides greater clinical value than default screening approaches, confirming its meaningful potential impact for clinical practice.

18

Deep Learning-Based Automated Echocardiographic Measurements in Pediatric and Congenital Heart Disease

Lukyanenko, P.; Ghelani, S. J.; Yang, Y.; Jiang, B.; Miller, T.; Higgins, P.; Kirakosian, M.; Tracy, K.; Kane, J.; Harrild, D. M.; Triedman, J.; Powell, A.; Geva, T.; La Cava, W.; Mayourian, J.

2026-02-09 cardiovascular medicine 10.64898/2026.02.06.26345782 medRxiv

Top 0.1%

23.1%

Show abstract

BackgroundEchocardiography (echo) is a cornerstone of pediatric cardiology, yet access to expert interpreters is limited worldwide, particularly in low-resource and rural settings. Artificial intelligence (AI) offers a mechanism to broadly deliver expert-level precision and standardize measurements, yet AI for comprehensive automated measurements in pediatric and congenital heart disease (CHD) echo remains underdeveloped. MethodsWe created EchoFocus-Measure, an AI platform that automatically extracts 18 quantitative parameters and 10 qualitative assessments from full echo studies. The method extends a multi-task, view-agnostic architecture (PanEcho) with a study-level transformer to prioritize diagnostically informative views. Training (80%) and internal testing (20%) were performed on echos from Boston Childrens Hospital (BCH), with external evaluation on outside referral studies. Left ventricular ejection fraction (LVEF) was the primary endpoint. ResultsThe internal cohort included 11.4 million videos from 217,435 echos (60,269 patients; median age 8.5 years; median LVEF 61%), and external validation encompassed 289,613 videos from 3,096 echos (2,506 patients; median age 3.5 years; median LVEF 62%). For LVEF, EchoFocus-Measure exhibited a median absolute error (MAE) of 2.8% internally and 3.8% externally, maintaining accuracy across infants (MAE 3.2%) and complex CHD lesions (e.g., MAE 4.0% for L-loop transposition of the great arteries). EchoFocus-Measure improved upon the PanEcho benchmark (MAE 7.5% for infants; 13.1% for L-loop transposition). Discrepant case (>50th percentile error) adjudication of LVEF demonstrated that model errors (MAE 2.4%) were within human variability (MAE 3.7%). For qualitative measures, EchoFocus-Measure performed well internally (AUROC 0.88-0.95) and modestly externally (AUROC 0.73-0.86). Explainability analyses highlighted model focus on clinically appropriate echo views for LVEF estimation (apical four-chamber, parasternal short/long) and mitral regurgitation assessment (apical four-chamber color Doppler, parasternal short/long color Doppler). ConclusionsEchoFocus-Measure delivers rapid and reliable automated echo measurements across ages and lesions within diverse internal and real-world external cohorts, serving as a step toward scalable, global access to high-quality pediatric cardiovascular care.

19

Study Protocol for the Pilot Evaluation for SMartphone-adaptable Artificial Intelligence for PRediction and DeTection of Left Ventricular Systolic Dysfunction (The SMART-LV Pilot Study Protocol)

Dhingra, L. S.; Aminorroaya, A.; Sangha, V.; Khunte, A.; Oikonomou, E. K.; Mortazavi, B. J.; McNamara, R.; Herrin, J.; Wilson, F. P.; Krumholz, H.; Khera, R.

2023-01-31 cardiovascular medicine 10.1101/2023.01.30.23285120 medRxiv

Top 0.1%

23.1%

Show abstract

IntroductionDespite a prevalence of 3-5% among adults, asymptomatic left ventricular systolic dysfunction (LVSD) remains underdiagnosed. There is a critical need for an accurate and widely accessible screening strategy for LVSD, given its association with preventable morbidity and premature mortality. A novel deep learning approach has demonstrated the ability to detect LVSD directly from ECG images, with retrospective validation across multiple institutions. There is a lack of prospective validation. In this pilot study, we evaluate the feasibility of screening and recruiting individuals for prospective echocardiography based on an image-based artificial intelligence (AI)-ECG algorithm applied to the ECG repository at a large academic medical center. Research Methods and AnalysisThis is the protocol for a prospective cohort study in outpatient primary care clinics of the Yale New Haven Hospital (YNHH). Adult patients who have undergone a 12-lead ECG without subsequent echocardiogram as a part of routine clinical care within 90 days of the ECG will be identified in the electronic health record (EHR). The AI-ECG model for LVSD will be deployed to YNHH ECG repository to define the probability of LVSD, identifying 5 patients each with high and low probability of LVSD. After discussion with primary care physicians, and subsequent contact by the study team, screened participants will be invited for and undergo an echocardiogram. The study participants and the cardiologists conducting the echocardiograms will be blinded to the results of the AI-ECG screen. The analysis will focus on feasibility metrics: the proportion (i) of all patients undergoing ECGs who have high probability of LVSD without subsequent echocardiogram, (ii) of patients who agree to participate in the study, and (iii) that undergo an echocardiogram. A descriptive exploration of the comparison of the AI-ECG and echocardiogram results will also be reported. Ethics and DisseminationAll patient EHR data required for assessing eligibility and conducting the AI-ECG screening will be accessed through secure servers approved for protected health information. Potential participants will only be contacted after they have discussed the study information with their primary care physician. All participants will be required to provide written informed consent before participation and data will be deidentified prior to analysis. This study protocol has been approved by the Yale Institutional Review Board (Protocol Number: 2000034006) and has been registered at ClinicalTrials.gov (Identifier: NCT05630170). The results of the future validation study will be published in peer-reviewed journals and summaries will be provided to the study participants.

20

Machine Learning Models Enhance Prediction of Arrhythmogenic Right Ventricular Cardiomyopathy

Quansah, K. K.; Murphy, S. A.; Kwon, E.; Anderson, E.; Carrick, R. T.; James, C. A.; Calkins, H.; Kwon, C.

2025-06-17 cardiovascular medicine 10.1101/2025.06.16.25329706 medRxiv

Top 0.1%

23.0%

Show abstract

Arrhythmogenic Right Ventricular Cardiomyopathy (ARVC) is a leading contributor to sudden cardiac death worldwide in young adults, yet its diagnosis remains complex, expensive and time-consuming. Machine-learning (ML) classifiers offer a practical solution by delivering rapid, scalable predictions that can lessen dependence on expert interpretation and speed clinical decision-making. Here, we benchmarked six ML algorithms for ARVC detection using area-under-the-curve (AUC) and accuracy as primary metrics. Gradient Boosted Trees outperformed all other models, achieving a c-statistic of 94.34% after rigorous cross-validation. These results underscore the promise of Gradient Boosted Trees classifier as an effective decision-support tool within the ARVC diagnostic workflow, with potential to streamline evaluation and improve patient outcomes.